Pooling Hybrid Representations for Web Structured Data Annotation

نویسندگان

  • Luciano Barbosa
  • Breno W. Carvalho
  • Bianca Zadrozny
چکیده

Automatically identifying data types of web structured data is a key step in the process of web data integration. Web structured data is usually associated with entities or objects in a particular domain. In this paper, we aim to map attributes of an entity in a given domain to pre-specified classes of attributes in the same domain based on their values. To perform this task, we propose a hybrid deep learning network that relies on the format of the attributes’ values. It does so without any pre-processing or using predefined hand-crafted features. The hybrid network combines sequence-based neural networks, namely convolutional neural networks (CNN) and recurrent neural networks (RNN), to learn the sequence structure of attributes’ values. The CNN captures short-distance dependencies in these sequences through a sliding window approach, and the RNN captures long-distance dependencies by storing information of previous characters. These networks create different vector representations of the input sequence which are combined using a pooling layer. This layer applies a specific operation on these vectors in order to capture their most useful patterns for the task. Finally, on top of the pooling layer, a softmax function predicts the label of a given attribute value. We evaluate our strategy in four different web domains. The results show that the pooling network outperforms previous approaches, which use some kind of input pre-processing, in all domains.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

HAWK - Hybrid Question Answering Using Linked Data

The decentral architecture behind the Web has led to pieces of information being distributed across data sources with varying structure. Hence, answering complex questions often required combining information from structured and unstructured data sources. We present HAWK, a novel entity search approach for Hybrid Question Answering based on combining Linked Data and textual data. The approach u...

متن کامل

Improving Web Search Ranking by Incorporating Structured Annotation of Queries

Web users are increasingly looking for structured data, such as lyrics, job, or recipes, using unstructured queries on the web. However, retrieving relevant results from such data is a challenging problem due to the unstructured language of the web queries. In this paper, we propose a method to improve web search ranking by detecting Structured Annotation of queries based on top search results....

متن کامل

Knowledge Extraction for Hybrid Question Answering

Since the proposal of hypertext by Tim-Berners Lee to his employer CERN on March 12, 19891 the World Wide Web has grown to more than one billion Web pages and still grows.2 With the later proposed Semantic Web vision [1], Lee et al. suggested an extension of the existing (Document) Web to allow better reuse, sharing and understanding of data. Both the Document Web and the Web of Data (which is ...

متن کامل

Annotation-Based Automatic Action Processing

With a strong motivational background in search engine optimization the amount of structured data on the web is growing rapidly. The main search engine providers are promising great increase in visibility through annotation of the web page’s content with the vocabulary of schema.org and thus providing it as structured data. But besides the usage by search engines the data can be used in various...

متن کامل

Annotation as Process, Thing, and Knowledge: Multi-domain studies of structured data annotation

Following Buckland’s (1991) work on the nature of information, this paper characterizes the multi-faceted concept of ‘annotation’ as process, thing, and knowledge. This typology is then used to enumerate general research questions for the exploration of annotation in arbitrary domains. Our research team’s investigation of annotation of structured data in specific domains and user groups is desc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1610.00493  شماره 

صفحات  -

تاریخ انتشار 2016